Modeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification

نویسندگان

  • Negin Manavizadeh Department of Electrical and Electronic Engineering, Nanostructured-Electronic Devices Laboratory, Faculty of Electrical Engineering, K. N. Toosi University of Technology, Tehran, Iran.
  • Tara Ghafouri Department of Electrical and Electronic Engineering, Nanostructured-Electronic Devices Laboratory, Faculty of Electrical Engineering, K. N. Toosi University of Technology, Tehran, Iran.
چکیده مقاله:

Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Methods: Feature selection algorithms have been modeled in Matlab R2021a during April and May 2022 in the framework of statistical pattern recognition. First, the features are ranked based on normalized mutual information, as a metric of relevance and redundancy of features, and accordingly, an optimum feature subset with the highest accuracy of classification is selected. Two feature selection algorithms, i.e., inclusion of features enhancing the classification accuracy and exclusion of irrelevant features are applied to the interest datasets, subsequent to the mini-batch k-means clustering of records. Results: At the end of the execution of both feature selection methods, evaluation metrics including accuracy, precision, recall, and F1 score are measured and compared. Both proposed feature selection approaches for the molecular biology, hepatitis C virus (HCV), and E. coli bacteria datasets result in the precision and recall scores more than 98 percent, meaning that there are few false positives and false negatives in the linear support vector machine (LSVM) classification. Regarding the HCV dataset, selection of nine relevant features among the thirteen present ones using the feature exclusion method yields the classification accuracy and F1 score of 98.92 percent and 99.02 percent, respectively. The feature inclusion approach also results in an accuracy of 98.78 percent with a slight discrepancy. Conclusion: The results reveal superior strength of the feature selection methods used here for life science datasets with higher-order features such as protein/gene expression database. The potentials to generalize to other classifiers and automatically specify the optimal number of features during the feature selection procedure make these approaches flexible in many data mining applications for the life sciences.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

H-BwoaSvm: A Hybrid Model for Classification and Feature Selection of Mammography Screening Behavior Data

Breast cancer is one of the most common cancer in the world. Early detection of cancers cause significantly reduce in morbidity rate and treatment costs. Mammography is a known effective diagnosis method of breast cancer. A way for mammography screening behavior identification is women's awareness evaluation for participating in mammography screening programs. Todays, intelligence systems could...

متن کامل

OPTIMAL SHAPE DESIGN OF GRAVITY DAMS BASED ON A HYBRID META-HERURISTIC METHOD AND WEIGHTED LEAST SQUARES SUPPORT VECTOR MACHINE

A hybrid meta-heuristic optimization method is introduced to efficiently find the optimal shape of concrete gravity dams including dam-water-foundation rock interaction subjected to earthquake loading. The hybrid meta-heuristic optimization method is based on a hybrid of gravitational search algorithm (GSA) and particle swarm optimization (PSO), which is called GSA-PSO. The operation of GSA-PSO...

متن کامل

Sustainable Supplier Selection by a New Hybrid Support Vector-model based on the Cuckoo Optimization Algorithm

For assessing and selecting sustainable suppliers, this study considers a triple-bottom-line approach, including profit, people and planet, and regards business operations, environmental effects along with social responsibilities of the suppliers. Diverse metrics are acquainted with measure execution in these three issues. This study builds up a new hybrid intelligent model, namely COA-LS-SVM, ...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Margin-based Feature Selection Techniques for Support Vector Machine Classification

Feature selection for classification working in high-dimensional feature spaces can improve generalization accuracy, reduce classifier complexity, and is also useful for identifying the important feature “markers”, e.g., biomarkers in a bioinformatics or biomedical context. For support vector machine (SVM) classification, a widely used feature selection technique is recursive feature eliminatio...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 80  شماره 7

صفحات  546- 562

تاریخ انتشار 2022-10

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023